[00:08.070 --> 00:13.890]  Hello, everyone. Thank you so much for coming here to listen to this talk show.
[00:13.890 --> 00:18.590]  And I am very honored to be here to bring this talk show of Fully Immune System
[00:20.490 --> 00:24.570]  and to learn from my two professors, Prof. Wen and Prof. Wang.
[00:25.890 --> 00:28.550]  And I am a lecturer from the Chinese University.
[00:28.890 --> 00:30.070]  I am from the Department of Neuroscience.
[00:34.270 --> 00:39.770]  And I am going to talk about six aspects of this talk show.
[00:39.770 --> 00:41.190]  First, speech recognition.
[00:41.190 --> 00:42.890]  Second, technical background.
[00:43.390 --> 00:46.650]  Third, method.
[00:47.310 --> 00:50.990]  Fourth, attacking Google image search engine.
[00:50.990 --> 00:53.610]  And finally, discussion.
[00:55.330 --> 01:10.250]  The image search engine is a service that allows users to search for similar images in their database
[01:10.250 --> 01:10.870]  by uploading images or using the URL of the uploaded image.
[01:13.390 --> 01:19.450]  This service can be called reverse image search or image search.
[01:22.550 --> 01:25.430]  Reverse image search is a type of CBR.
[01:25.430 --> 01:29.930]  CBR stands for image search based on image content.
[01:30.550 --> 01:32.800]  And this is...
[01:37.070 --> 01:41.330]  This is a sample image provided to the CBR system.
[01:42.570 --> 01:47.870]  And it will search for similar images through this sample image.
[01:47.870 --> 01:51.870]  And the process is like this.
[01:53.850 --> 02:00.850]  The image search engine can search for similar images on Google, Baidu, or other image search engines.
[02:05.610 --> 02:10.730]  And this reverse image search engine has many uses.
[02:10.730 --> 02:18.910]  One of the more intensive uses is to detect plagiarism.
[02:20.130 --> 02:27.330]  Google image search engine and Baidu image search engine can provide this search source method.
[02:27.330 --> 02:29.730]  This is what we call CBR search.
[02:29.730 --> 02:36.730]  And a funny example of plagiarism detection is...
[02:36.730 --> 02:41.610]  In 2016, a TV series, Freehouse, published a poster.
[02:41.610 --> 02:47.590]  This poster plagiarized the original picture of the author, Lu He, in 2014.
[02:47.590 --> 02:50.870]  And he was accused of plagiarism and finally apologized.
[02:50.870 --> 02:55.590]  But is there any infringement in this picture?
[02:55.590 --> 02:59.090]  And what method can we use to detect it?
[02:59.090 --> 03:03.810]  We upload this poster to Google image search engine.
[03:03.810 --> 03:09.170]  And the original picture can be found in the search result.
[03:09.170 --> 03:12.810]  This is obviously plagiarism.
[03:12.810 --> 03:19.190]  And the CBR system also has its own potential threat.
[03:19.190 --> 03:29.930]  That is, attackers can add some small visual disruptions to the original picture.
[03:29.930 --> 03:35.170]  And make the search engine unable to search for similar images.
[03:36.590 --> 03:39.150]  For example, I made a DEF CON poster.
[03:39.150 --> 03:40.770]  I also used its original picture.
[03:40.770 --> 03:45.690]  If you download this Google image search engine, you can obviously search for its original picture.
[03:45.690 --> 03:48.170]  You can search for plagiarism.
[03:48.170 --> 03:52.230]  But if you add some small visual disruptions, you won't be able to search for plagiarism.
[03:52.570 --> 03:57.690]  To explain this trick, let's start with the CBR system framework.
[03:58.130 --> 04:04.090]  The CBR system is a two-stage system.
[04:04.410 --> 04:08.790]  The first stage is the locking stage of the image database.
[04:08.790 --> 04:12.870]  Each image in the database can be extracted from the database.
[04:17.790 --> 04:23.510]  And each image is converted into a cluster of images.
[04:23.510 --> 04:29.190]  And the cluster of images are stored in the database.
[04:30.690 --> 04:35.570]  And when users upload a search image,
[04:35.570 --> 04:42.110]  they can use the same algorithm to extract the cluster of images.
[04:42.110 --> 04:51.130]  And the cluster of images can be compared with each other to find the closest image.
[04:52.270 --> 05:01.090]  And the two most commonly used and most influential clusters are SIFT and SIFT.
[05:01.090 --> 05:04.230]  These two are very old-fashioned clusters.
[05:04.230 --> 05:08.230]  As you can see from the CBR system,
[05:08.850 --> 05:13.750]  image search is a process where images are described.
[05:15.810 --> 05:18.030]  Let's start with these two features.
[05:18.490 --> 05:22.370]  The SIFT feature is a more precise feature.
[05:22.370 --> 05:27.510]  And the SIFT feature is a faster feature.
[05:27.510 --> 05:35.130]  These two features are built into the image cluster.
[05:35.130 --> 05:37.530]  And they are called cluster spaces.
[05:37.530 --> 05:41.630]  And the feature points are found from different dimensions.
[05:47.450 --> 05:52.350]  A point is classified as a possible feature point.
[05:52.350 --> 05:53.730]  There are two conditions.
[05:53.730 --> 05:56.930]  The first condition is that this point is in the cluster space.
[05:56.930 --> 06:02.990]  Out of the 26 points, this point has the highest value.
[06:02.990 --> 06:09.830]  The second condition is that its value is greater than the prediction value of the algorithm.
[06:10.990 --> 06:13.750]  After this point is identified as a feature point,
[06:14.110 --> 06:23.010]  a number is drawn from the relationship between this point and its upper and lower dimensions to represent the feature point.
[06:23.010 --> 06:27.310]  The SIFT feature point number is 128.
[06:27.310 --> 06:30.890]  And the SIFT feature point number is 64.
[06:31.410 --> 06:37.330]  The SIFT feature point and SIFT algorithm are built into the image cluster.
[06:37.330 --> 06:40.630]  The SIFT feature point and SIFT algorithm are built into the image cluster.
[06:42.330 --> 06:56.510]  The image cluster consists of different sizes of high-resolution filters.
[06:56.510 --> 07:00.930]  The filter is divided into different dimensions.
[07:02.310 --> 07:07.510]  The filter is divided into different dimensions.
[07:13.490 --> 07:18.910]  The SIFT algorithm is similar in principle.
[07:19.090 --> 07:23.850]  But the value of each point is not represented by the DOG value.
[07:23.850 --> 07:27.550]  Instead, the DOG value is represented by the HESIT value.
[07:30.710 --> 07:34.530]  I will explain the formula in detail later.
[07:35.310 --> 07:40.170]  The SIFT and SIFT feature points are very responsive.
[07:42.170 --> 07:47.350]  The SIFT feature point and the SIFT feature point are non-rotating and non-rotating.
[07:47.350 --> 07:50.970]  A circle can be rotated or shrunk.
[07:50.970 --> 07:54.610]  Or it can be cut.
[07:54.970 --> 07:58.430]  In this way, a circle can be found in the image cluster.
[08:01.150 --> 08:19.110]  We have proposed a method to minimize the similarity between the SIFT feature point and the SIFT feature point.
[08:27.210 --> 08:34.150]  We can move the feature point from the SIFT feature point to the SIFT feature point.
[08:34.150 --> 08:41.930]  We need to emphasize the visibility and usability of the image.
[08:41.930 --> 08:48.930]  If we add a very small disturbance to the image, we can accept it.
[08:51.870 --> 08:57.490]  But if we add a very large disturbance, such as the image on the right, it will look very different from the original image.
[08:58.090 --> 08:59.870]  This is unacceptable.
[09:01.050 --> 09:05.890]  For a human observer, the visual meaning of the image is very important.
[09:09.210 --> 09:14.770]  The safety of the SIFT feature point has been studied before.
[09:15.710 --> 09:20.090]  Many people have proposed methods to remove the SIFT feature point.
[09:20.570 --> 09:23.930]  We use the RMD algorithm.
[09:24.470 --> 09:30.070]  The RMD algorithm is a faster and more effective way to remove the SIFT feature point.
[09:32.230 --> 09:43.090]  In order to separate the RMD algorithm from the SIFT feature point, we call it RMD-SIFT.
[09:46.310 --> 09:49.050]  The effect of RMD-SIFT can also be seen from below.
[09:49.370 --> 09:53.830]  If 187 people are connected at the same time, it will be 73 feature points.
[09:56.410 --> 10:04.890]  In our research, we haven't found a method to remove the SIFT feature point.
[10:04.890 --> 10:10.870]  So we have proposed a method that is similar to the SIFT feature point.
[10:13.190 --> 10:16.330]  It is called RMD-SIFT.
[10:16.330 --> 10:22.950]  It is a process based on optimization.
[10:24.730 --> 10:26.810]  The formula is also listed below.
[10:28.470 --> 10:30.910]  This algorithm is very effective.
[10:30.910 --> 10:35.110]  And it has a very small disturbance.
[10:36.090 --> 10:43.630]  But we won't remove the SIFT feature point at a high level.
[10:46.070 --> 10:49.010]  Because it will cause a lot of disturbance to the picture.
[10:49.010 --> 10:52.380]  So we will only remove the SIFT feature point at a low level.
[10:54.290 --> 11:01.390]  To evaluate the RMD algorithm, we use the Vector Index.
[11:01.390 --> 11:04.910]  You can download it from the author's homepage.
[11:05.990 --> 11:10.410]  The picture on the left is the result of the RMD algorithm.
[11:11.990 --> 11:17.050]  The most common picture in this system is the one on the right.
[11:17.050 --> 11:19.330]  It can be seen as a curve.
[11:19.710 --> 11:23.110]  This experiment can show that for a simple picture,
[11:24.350 --> 11:26.830]  a picture with few feature points,
[11:27.950 --> 11:32.990]  using the Vector Index can achieve the purpose of disturbance.
[11:32.990 --> 11:36.270]  But for a picture with a lot of feature points,
[11:36.270 --> 11:40.750]  using the Vector Index may not be enough.
[11:40.750 --> 11:42.110]  Because if you look at the first picture,
[11:42.110 --> 11:55.790]  it can't reduce the similarity to the highest level.
[11:55.910 --> 12:00.230]  But if you reduce the similarity to 60 levels,
[12:00.230 --> 12:02.490]  it can achieve the purpose of disturbance,
[12:02.490 --> 12:07.210]  but it can't keep the visual meaning.
[12:09.370 --> 12:14.690]  So using the Vector Index alone is not very reliable.
[12:15.750 --> 12:21.670]  So in order to increase the difference between the two pictures,
[12:21.670 --> 12:23.890]  and to keep the visual meaning,
[12:25.170 --> 12:30.010]  we came up with a method to insert feature points.
[12:31.550 --> 12:34.190]  There are two questions that need to be answered.
[12:34.190 --> 12:37.830]  First, where do we insert the feature point?
[12:38.110 --> 12:40.650]  We can insert it into the center of the picture,
[12:40.650 --> 12:43.050]  or we can insert it into the border of the picture.
[12:43.090 --> 12:47.530]  The second question is, how is the feature point generated?
[12:47.530 --> 12:54.630]  The first idea is to use the Vector Index to generate feature points.
[12:54.630 --> 13:01.750]  The second idea is to add some Basic Bricks around the feature point.
[13:01.750 --> 13:06.030]  Basic Bricks is a basic unit.
[13:06.250 --> 13:08.430]  A feature point can be generated in this unit.
[13:08.430 --> 13:10.950]  So if you insert Basic Bricks,
[13:10.950 --> 13:15.430]  you can generate as many feature points as possible in a relatively small space.
[13:17.250 --> 13:19.570]  Let me introduce RMD algorithm.
[13:21.790 --> 13:25.690]  It is a reverse operation of RMD algorithm.
[13:31.170 --> 13:37.590]  It assigns a feature point with a value and some basic information.
[13:37.590 --> 13:43.270]  If the value of the feature point is greater than the value of the feature point,
[13:43.270 --> 13:46.670]  we can calculate this point as a feature point.
[13:46.670 --> 13:48.170]  If the value of the feature point is less than the value of the feature point,
[13:48.170 --> 13:51.130]  we can add a patch around the feature point.
[13:51.130 --> 13:55.970]  Then the value of the feature point is greater than the value of the feature point.
[13:56.410 --> 13:58.710]  This is a method to insert feature points.
[13:59.050 --> 14:00.250]  Let's look at the picture below.
[14:00.370 --> 14:05.710]  You can see that we have inserted 5 feature points into this small part.
[14:07.090 --> 14:08.770]  Or we can insert it into the border.
[14:09.250 --> 14:11.830]  We can also see that we have inserted some feature points into the border.
[14:11.830 --> 14:15.890]  Let me introduce FacingGrips.
[14:16.510 --> 14:20.550]  In order to reduce the size of the border,
[14:20.550 --> 14:24.100]  we need to insert feature points as closely as possible around the picture.
[14:24.770 --> 14:33.390]  So, we extract some feature points from a data library.
[14:33.610 --> 14:37.830]  Then we sort the knowledge area around the feature points.
[14:37.830 --> 14:44.070]  We use one of the sorting centers as a BasicGrips.
[14:44.070 --> 14:46.870]  Then we sort it into a pattern.
[14:46.870 --> 14:50.710]  In this way, the feature point can be generated more closely.
[14:51.130 --> 14:56.730]  You can see that the picture below is a picture of the feature point after it has been visualized.
[14:57.010 --> 14:59.310]  It can be generated more closely.
[15:01.650 --> 15:05.250]  Let's evaluate the input algorithm.
[15:05.250 --> 15:09.350]  In the left picture, we have inserted 20 feature points.
[15:09.350 --> 15:14.010]  The result of the search in the visual index is the picture on the right.
[15:14.450 --> 15:17.650]  This experiment also shows that for some simple pictures,
[15:17.650 --> 15:23.590]  we can use the injection algorithm to realize the bypass.
[15:23.590 --> 15:25.590]  But for some complex pictures,
[15:25.590 --> 15:33.610]  whether you insert the feature points into the center of the picture or into its border,
[15:33.610 --> 15:35.870]  under the premise that you want to keep the center of the picture,
[15:35.870 --> 15:38.490]  there is no way to realize the bypass.
[15:38.490 --> 15:43.230]  So, naturally, we came up with a combination of these two methods.
[15:43.290 --> 15:45.930]  We extract the feature points in the center of the picture,
[15:45.930 --> 15:48.710]  and then add some feature points to the border of the picture.
[15:48.770 --> 15:55.210]  In this local system, this experiment is realized.
[15:56.770 --> 16:01.430]  As you can see, it is feasible to bypass in this local system.
[16:03.610 --> 16:10.210]  Next, let's see if it is possible to bypass in some online picture search systems.
[16:11.210 --> 16:14.430]  So, we chose five picture search engines.
[16:15.170 --> 16:19.510]  There are two challenges for the search engine.
[16:19.510 --> 16:25.390]  The first one is that we really don't know its algorithm.
[16:25.390 --> 16:32.470]  The second challenge is that we don't know whether the search results are available or not.
[16:32.470 --> 16:33.590]  There are only these two responses.
[16:33.590 --> 16:35.430]  There are no other parameters for us.
[16:35.430 --> 16:39.110]  But after some preliminary test experiments,
[16:39.110 --> 16:45.370]  we can feel that it is possible to use the method of extracting feature points,
[16:45.370 --> 16:48.770]  such as SIFT and SUP.
[16:49.010 --> 16:53.930]  Based on the corresponding three strategies,
[16:53.930 --> 16:59.570]  the method of extracting feature points corresponds to the first one.
[17:00.130 --> 17:05.210]  The second challenge is that if it is a simple picture,
[17:06.250 --> 17:08.910]  it can be bypassed by using the algorithm in the search engine.
[17:10.150 --> 17:12.370]  But for some complicated pictures,
[17:15.750 --> 17:19.410]  we can't remove all the feature points in it,
[17:19.410 --> 17:22.770]  And as I said before, the feature points are very reflective.
[17:22.770 --> 17:25.290]  If there are only 10% feature points,
[17:25.290 --> 17:29.610]  it is also possible to perform a correct search in this system.
[17:29.610 --> 17:32.730]  So this is not very reliable.
[17:33.050 --> 17:36.790]  This injection-only strategy is the same.
[17:36.790 --> 17:38.810]  In a simple picture,
[17:38.810 --> 17:46.470]  we can add a frame with a width of 15 pixels next to it,
[17:46.470 --> 17:51.090]  which can bypass Google image search.
[17:51.090 --> 17:53.030]  But for some complicated pictures,
[17:53.210 --> 17:57.290]  a frame with a width of 50 pixels can't be bypassed.
[17:57.290 --> 18:02.170]  So we use the combined algorithm.
[18:02.170 --> 18:03.850]  Based on the combined algorithm,
[18:03.850 --> 18:12.090]  we first extract some feature points in the middle of the picture with an RMB algorithm,
[18:13.050 --> 18:16.750]  and then add some feature points around the picture.
[18:18.390 --> 18:23.910]  After that, we can see some feature points in the middle of the picture,
[18:23.910 --> 18:26.210]  and some feature points that have been removed.
[18:26.510 --> 18:29.290]  Finally, we can see some new feature points in the RMB algorithm.
[18:35.150 --> 18:38.070]  And finally, we can try it.
[18:38.230 --> 18:41.890]  The original picture can match a lot of results.
[18:41.890 --> 18:45.510]  But with this new picture, we can't match the results.
[18:45.510 --> 18:48.690]  We can't match the results when we use BasicRex to generate the frame.
[18:51.290 --> 18:54.110]  So let's take a look at the poster.
[18:54.530 --> 18:56.710]  This is the original picture of the poster.
[18:57.310 --> 19:03.450]  In fact, we reduced some feature points in the box,
[19:03.450 --> 19:07.350]  and added some feature points around it.
[19:07.350 --> 19:09.710]  So we can achieve this illusion.
[19:10.270 --> 19:13.330]  Finally, let's talk about the limitation of our experiment.
[19:13.330 --> 19:15.890]  We didn't complete the source target attack.
[19:15.890 --> 19:24.150]  The source target attack means that I forced the search result to be a target picture.
[19:24.150 --> 19:28.670]  For example, if I increase the angle of the spread model,
[19:28.670 --> 19:35.450]  I can search for the result of another picture with the Google picture search engine.
[19:36.510 --> 19:39.170]  But we didn't achieve this.
[19:39.170 --> 19:42.190]  But we proposed a guarantee on our local system.
[19:42.190 --> 19:50.730]  If we move the feature points on the target picture to the local system,
[19:52.150 --> 19:54.010]  we can achieve this.
[19:54.010 --> 19:55.970]  But there is a problem.
[19:55.970 --> 20:00.590]  The visual disturbance is very big.
[20:00.590 --> 20:07.990]  So we will find a better strategy to attack in the future.
[20:07.990 --> 20:19.530]  To sum up, we proved that the threat of the CDR system exists.
[20:19.530 --> 20:27.250]  And we also proved that the Google picture search engine is easy to prepare.
[20:27.430 --> 20:28.790]  Thank you.
